Empirical validation of integrated stock assessment models to ensuring risk equivalence: A pathway to resilient fisheries management

The Precautionary Approach to Fisheries Management requires an assessment of the impact of uncertainty on the risk of achieving management objectives. However, the main quantities, such as spawning stock biomass (SSB) and fish mortality (F), used in management metrics cannot be directly observed. This requires the use of models to provide guidance, for which there are three paradigms: the best assessment, model ensemble, and Management Strategy Evaluation (MSE). It is important to validate the models used to provide advice. In this study, we demonstrate how stock assessment models can be validated using a diagnostic toolbox, with a specific focus on prediction skill. Prediction skill measures the precision of a predicted value, which is unknown to the model, in relation to its observed value. By evaluating the accuracy of model predictions against observed data, prediction skill establishes an objective framework for accepting or rejecting model hypotheses, as well as for assigning weights to models within an ensemble. Our analysis uncovers the limitations of traditional stock assessment methods. Through the quantification of uncertainties and the integration of multiple models, our objective is to improve the reliability of management advice considering the complex interplay of factors that influence the dynamics of fish stocks.


Introduction
The main objectives of fisheries management are to ensure that stocks provide the maximum sustainable yield (MSY) and are maintained with high probability above a point where productivity is impaired.Therefore, the provision of fisheries management advice requires the assessment of the state of the stock relative to target and limit reference points, the prediction of the response of the stock to management, and the verification that the predictions are consistent with observations.However, the main quantities of interest, spawning stock biomass (SSB) and fishing mortality (F), cannot be observed.Therefore, models with latent variables are required to assess the state of the stock, derive reference points, and propose management actions.
Currently, the primary diagnostics used to select and reject assessment models are to examine residuals to verify goodness of fit and to perform a retrospective analysis to verify stability.However, residual patterns can be removed by adding more parameters than justified by the data, and retrospective patterns by ignoring the data [1].Validation using empirical data plays an important role in sustainability science [2], and models must be validated if they are to provide robust and credible advice [3].This requires assessing whether it is plausible that a system equivalent to the model generated the data [4] and whether assumptions are violated.Therefore, an alternative to residual and retrospective analysis is to perform a hindcast by omitting recent observations and then predicting their out-of-sample values [5].Prediction skill is a measure of the precision of a predicted value unknown to the model relative to its observed value.Prediction skill can be used to explore model misspecification and data conflicts, to help identify alternative hypotheses, and can be used as objective methods to select, reject, and assign weights to models [6].
To ensure that advice on the consequences of tactical and strategic management actions is robust, the Precautionary Approach to Fisheries Management [7] requires the quantification of uncertainty and reduces the risk that uncertainty hinders the achievement of management objectives.Ideally, risk equivalence should be considered so that objective-based management decisions can be maintained within acceptable risk levels and deliver results consistent with expectations and trade-offs between them [8].In the context of single species advice, this means that in situations with poor or limited data and consequently greater uncertainty, management should not allow greater risks as required in tiered assessment frameworks [9].
There are three main modelling paradigms to provide advice: the best assessment, the model ensemble, and Management Strategy Evaluation (MSE).Each has its own means of determining quality that implies plausibility, but plausibility is rarely objectively defined.In the best-assessment paradigm, alternative models are fitted to historical time series of independent and fishery-dependent data, and then a single scenario is selected based on goodness-offit diagnostics [10].However, there is often a lack of information in stock assessment data on system processes [11][12][13][14], and the data sets may conflict.Therefore, ensembles in which model estimates are combined, consisting of as few as two [15] or thousands [16] of models, may be preferred.The third paradigm, MSE, is a formal way of simulation-testing feedback control [17].The aim is to design robust and fault-tolerant control systems that allow management objectives to be met despite the uncertainty represented by Operating Models that represent resource dynamics [18].In MSE, advice may be provided by an empirical control rule using an indicator based on data rather than a stock assessment.The indicator should be able to track the status or trends in the stock, and after implementation a review should be performed to evaluate whether management objectives have been achieved.Therefore, prediction skill is valuable for selecting Operating Models which may be conditioned on a stock assessment, the selection of indicators, and in assessments conducted as part of implementation reviews.
An assumption under the best assessment and ensemble paradigms is that model outputs quantify the consequences of the uncertainties in model inputs.To do this requires an uncertainty analysis, for example, in the best assessment, sampling the values of fixed parameters from prespecified distributions, or for an ensemble by including all plausible models.In the latter case, multiple models are run for scenarios related to alternative model structures and values for prespecified parameters, and the results are combined to provide advice [19].[20] used a set of southern bluefin assessment scenarios to cover the range of interpretations of the main uncertainties.However, multiple alternative model structures may be equally plausible and therefore the number of model scenarios required to perform a full uncertainty analysis may be infeasibly large.Instead, sensitivity analyses are generally preferred, i.e. systematic investigation of the reaction of model outputs to extreme values of the model inputs and drastic changes in the model structure.It is possible to perform a sensitivity analysis of the model around a reference case and then use it as part of a first-order uncertainty analysis [21].In any case, an objective approach should be used for selecting, screening, and weighting hypotheses, to overcome artefacts and biases introduced by a "cherry picking" approach [22].In particular, since divergent views and beliefs mean that uncertainties can be used to support stakeholder positions and to strengthen or weaken management measures [23].
Providing probabilistic advice involves determining the uncertainty of the model output derived from uncertain inputs and assumptions of the model [24].Increasingly, multiple models are used to develop advice [19,25], either combined to make probability statements or as Operating Models in MSE to represent alternative hypotheses reflecting uncertainty about resource dynamics to which management should be robust.Multiple models may be combined using an ensemble that treats each model scenario as an alternative hypothesis and implicitly recognises that each may explain the data equally well or be weighted based on an estimate of plausibility [26,27].In the case of the best assessment, uncertainty is based on confidence or credible intervals, whereas in an ensemble estimates are combined across models.Although ensembles can improve model predictions, they must themselves be validated, formed from a diverse set of models to minimise redundancy, and built on a data set representative of the population to which they are applied [28].In MSE, a set of references or a single reference case is developed.These scenarios are a limited set of Operating Model scenarios that include the most significant uncertainties in the structure, parameters, and data of the model.Alternative scenarios should be highly plausible and have a significant impact on the performance statistics of candidate MPs.In addition, a robustness set should be developed to assess performance across a wider range of plausible scenarios.These should represent hypotheses that are less plausible than those in the reference set and focus on challenging circumstances with potentially negative consequences that should be avoided.
In all paradigms, plausibility is rarely objectively defined.Therefore, we first explore the impact of uncertainty and then demonstrate ways to define the plausibility of alternative models by evaluating criteria based on retrospective bias and prediction skill [1].We then compare model weighting schemes and discuss how the process can be generalised.

Material and methods
The choice of scenarios for assessment models and methods of estimating uncertainty has an impact on the risk of exceeding the limit and missing target reference points.The procedure for selecting and rejecting scenarios in all paradigms will determine the advice.To better understand the impact of uncertainty on stock assessment advice and the risk of not meeting conservation and sustainability objectives, we used the uncertainty grid developed by the Indian Ocean Tuna Commission (IOTC) for albacore tuna (Thunnus alalunga) as an example.
The data set for the Indian Ocean albacore assessment includes records of catches and landings, abundance indices based on catch per unit of effort (CPUE) and samples of length composition.The assessment partitions the Indian Ocean into four regions, divided latitudinally along the 25˚S parallel and longitudinally along the 75˚E meridian.Fig 1 shows the distribution of catches between the four regions.The assessment includes 11 fisheries, including an aggregated longline fishery for each region [29], and a set of standardised CPUE indices that have been derived from the longline catch and effort data provided by Japan, Korea, and Taiwan, China [30].Area 3 is considered to represent the core of the distribution of the stock.

Uncertainty grid
Tuna Regional Fisheries Management Organisations commonly develop uncertainty grids to condition models in integrated stock assessments to account for uncertainties in parameters that cannot be estimated from the data, and data conflicts [19,[31][32][33][34][35][36].Grids consist of different plausible combinations of assumptions, fixed parameter values, and data sets.However, it is not always clear whether this is intended as an uncertainty analysis or a sensitivity analysis.
The uncertainty grid developed by the IOTC for albacore tuna (Thunnus alalunga), is a full factorial design with 1,440 model configurations [37] (Table 1).This is sufficient to provide contrast, but not too large to be unmanageable.We used Bayesian Markov Chain Monte Carlo (MCMC) methods [38] to estimate the uncertainty of the parameters for the base case.The uncertainty grid includes multiple configurations of integrated assessment models based on  current best-knowledge and available data [39].The grid was conditioned using stock syntheses [40].
In the Indian Ocean albacore stock case, several factors limit the ability to obtain robust model fits.These include problems with data completeness and quality [41], not limited to but including total catch statistics, length distribution in catches, and biological information.Therefore, a full factorial design of alternative model configurations based on parameter choices for which there is insufficient information in the data to estimate them or to decide between alternative options was used to construct the uncertainty grid (Table 1).
The reference case, considered by the IOTC the most plausible among a set of candidate models, was extended by selecting alternative values for fixed-parameter values and data weighting to develop the grid.Factors include i) alternative values of natural mortality (M) for juveniles (ages 0 to 4) and adults (age 5 or older); ii) two values for recruitment variability (sig-maR) of 0.4 and 0.6; iii) three values for the steepness (h) of the stock-recruitment relationship 0.7, 0.8, and 0.9; iv) four values for the coefficient of variation in the CPUE series of 0.2, 0.3, 0.4 and 0.5; v) three values for the relative weight of length sampling data in the total likelihood through changes in the effective sampling size parameter, 20, 50 and 100; vi) two scenarios for the effective catchability of the CPUE fleet: It was assumed that the fleet had not improved catchability plus an alternative scenario that considered a 1.0% yearly increase; vii) two possible functional forms for the long-line fleet selectivity were considered: a logistic function (Log), where selectivity stays at the maximum level for older sizes, or a double normal (DoNorm), where selectivity decreases for larger sizes.This resulted in a grid of 1,440 individual models, which covers most, but not all, plausible sources of uncertainty.
Model estimates.Provision of fisheries management advice requires the assessment of stock status relative to reference points to prevent growth, recruitment, economic and target overfishing.Growth and recruitment overfishing are generally associated with threshold or limit reference points, while economic overfishing can be expressed in terms of targets or limits [42].The difference between targets and limits is that indicators may fluctuate around targets, but in general limits should not be crossed.Target overfishing occurs when a target is overshot, while variations around a target are not necessarily considered serious unless a consistent over or undershoot becomes apparent.In contrast, even a low probability of violating a limit reference point (LRP) may indicate the need for immediate action.F MSY is often considered a limit and thresholds or triggers can also be implemented to initiate a management action.
Patterns or fluctuations generated by a model have an impact on advice [43].Therefore, two key properties of the output of the assessment model are the production function and the process error.The former is used to calculate analytical reference points [44], while the latter is a source of additional variability that is not represented by the main structure of the model [45].Process error may be due to variations in biotic or abiotic processes; that is, the drivers of population fluctuations that ecologists are interested in quantifying [46].Process errors arise when a deterministic component of a population model incorrectly describes population processes.Such process variation can be found, for example, in recruitment, fishery selectivity, or sampling processes [47].
If the stock size is represented by biomass and changes by the time-accounting equation [48] where B is exploitable biomass and C is catch, then surplus production (P(B t )) is the net change in biomass if C = 0, and represents the net addition to biomass due to the recruitment of fish too small to be taken into account in B, plus growth minus loss of natural mortality.
In integrated stock assessment models, the process error � t can be estimated as the difference between the deterministic expectation and the stochastic realisation for biomass B t+1 , i.e.
Full details of the methods in the next section are provided in Supporting Information.Goodness of fit.Goodness of fit involves assessing the fit of the model through the examination of residual patterns to identify any systematic misfits, such as bias or trends, that could indicate misspecification of the model.Simple residual plots are used along with statistical tests, including the Runs test [49], to detect deviations from expected patterns.Informationtheoretic criteria, such as Akaike's Information Criterion (AIC) and its variants (AICc for small sample sizes), [50], GIC [51], DIC [52] and WAIC [53]), are often applied to select models that best balance fit and complexity, considering both frequentist and Bayesian approaches.However, since the scenarios included different weightings due to data conflicts, Informationtheoretic criteria cannot be used for weighting in such cases.
Model consistency.We evaluated model consistency through retrospective analysis, specifically using Mohn's rho (rho M ), to measure systematic errors over time as data are sequentially omitted from the analysis [54].This approach helps identify biases that could affect management decisions.
Model validation.Validation focused on the model's ability to predict unseen data, employing hindcasting in which data points are removed using a tail cutting approach, i.e. removing data sequentially from the most recent years backward.and predicted by the model [1].This was primarily applied to catch-per-unit-effort (CPUE) data due to limitations in data availability, especially from regions beyond national jurisdiction.The prediction skill of the model was quantified using the Mean Absolute Scaled Error (MASE) [55], comparing the model forecasts with a naïve forecast over specified forecast horizons.MASE values less than 1 indicate predictions more accurate than the naïve forecast, offering a clear criterion to evaluate model performance.
The Diebold-Mariano test can be applied to compare the predictive precision of our model against a naive benchmark, providing a statistical basis to evaluate the significance of differences in performance prediction [56].

Time series analysis
Time series of the yield, fishing mortality, and SSB relative to their corresponding MSY reference points is summarised in Fig 2 .Absolute estimates of biomass and fishing mortality are uncertain because M is not well known and the use of relative values allows focus to be on trends and proportional changes.The reference case (black line) and the main effects where the levels of each factor vary one by one are shown; the ribbons delimit the range across all 1440 scenarios.The base case trajectories are in the middle of the range, as the scenarios were based on varying factors around the base case.Trajectories exhibit similar variability within a quantity, but tend not to intersect as they change at similar rates.Catches vary the most reflecting the impact of operational and environmental factors, while SSB the least as it is a modelled quantity and process error is accounted for in recruitment and selectivity.SSB estimates decreased while harvest rate and catches increased, that is, they are inversely correlated, as a large stock with low exploitation or vice versa can explain the observed catches.The yield, which represents the recorded catch, exhibits significant interannual variations, and since 2000, the catches have been above or close to MSY.Estimates are sensitive to the assumed level of M, since the scenarios for M = 0.2 and 0.4 bracket the other main scenarios.SSB remains above B MSY , but shows a downward trend.The harvest rate follows a trend similar to the yield but with less variability.For most time series, the harvest rate remains below F MSY .In recent years, some scenarios have shown that harvest rates reach or exceed F MSY .SSB scenarios were based on, but remain above, B MSY .

Stock status
The status of the stock in the terminal year for all 1440 scenarios is summarised in a Kobe phase plot (Fig 3).The green quadrant provides an assessment of sustainability, as it indicates a well-managed fishery where SSB/B MSY > 1 and (F/F MSY < 1).The red zone shows where a stock is overfished and where overfishing occurs.Two sets of data points are plotted: mustard points for the 1440 deterministic model estimates within the uncertainty grid and blue points for the MCMC base case posteriors, which account for uncertainty in the model parameters.Marginal distributions, depicted along the plot's axes, allow probabilities from model estimates and the MCMC analysis to be compared, i.e. targets by the central tendency and limits by the tails.The Kobe phase plot reveals a tendency for the deterministic model estimates to fall within the red zone, indicating overexploitation.However, the MCMC posteriors cluster toward the green zone, suggesting a more sustainable stock status.The contrast highlights the role of uncertainty in stock evaluations and decision making.
The current yield, F, and SSB relative to their MSY benchmarks are summarised in Fig 4) by natural mortality and steepness.The ratios are derived from the uncertainty grid, which also considers, but is not significantly influenced, factors such as juvenile M, ESS, CPUE CV, catchability, and selectivity.Therefore, these additional factors are integrated into the box and whisker plots.There is a compensatory relationship between steepness and M, since high steepness and low M result in outcomes comparable to those with low steepness and high M.The figure further illustrates that an increase in M is correlated with a decrease in F/F MSY .A relationship that has implications for data-deficient situations where M is used as a substitute for F MSY .As seen in the Kobe Phase plot, there is an inverse relationship between SSB/B MSY and F/F MSY ; scenarios with higher values of M and steepness are associated with lower fishing mortality ratios and higher biomass ratios of the spawning stock relative to MSY.This suggests that for model configurations with higher M and steepness, the stock is less exploited and has healthier spawning biomass compared to the MSY benchmarks.

Production functions
Plots of the relationship between equilibrium yield and equilibrium biomass, which are used to derive MSY benchmarks, are commonly called production functions.To understand the impact of the uncertainty grid on the production function, these are summarised in Fig 5 by the natural mortality rates of adults (M) and the steepness of the stock-recruitment relationship.The shading within the plots indicates the effective sample size, a measure of the amount of size information available to estimate the status of the stock.The shape and peak of these curves vary with the natural mortality rate and steepness, illustrating how sensitive the fishery's productivity is to these key life history parameters.Again, M and steepness have a large effect; increasing adult M results in higher productivity, and therefore MSY, while increasing steepness shifts the curve to the left, increasing F MSY (since F is equivalent to catch/biomass), making the stock more resilient to fishing pressure.

Process error
A property of the model estimates is process error, modelled in this case by recruitment deviates.

Decision tree analysis
The retrospective analysis using Mohn's ρ is summarised in Fig 8 .The black line identifies the reference case, and the main effects are indicated by the coloured lines.The base case fails with the lowest score, and the scenarios with the lowest retrospective bias are for M 0.4 and 0.2 and CPUE with a CV of 50%.The impact of factors and levels of the uncertainty grid is explored using a regression tree in Fig 9 using Mohn's ρ as response variable.Below the regression tree, the clusters are summarised by their MASE values, production functions, and Kobe plots.Where Mohn's rho is greater than a value of -0.15 within a cluster, the values are indicated by blue.
The key factors that influence Mohn's rho are ESS, CPUE CV, catchability, and adult M. The analysis did not select Sigma R, steepness, or selection pattern as influential factors.Therefore, adult M and the relative weights given to the CPUE and the length composition data have the main impact.First clusters are summarised by their MASE values, with a red vertical line

Discussion
The Kobe Phase plot in Fig 3 compared the estimation error of the base case with the model error of the uncertainty grid.The blue points, which denote the posteriors of the MCMC for the base case, are more tightly clustered and suggest sustainable stock status, as F is around F MSY and there is only a small probability that the stock falls below B MSY .In contrast, the yellow points of the uncertainty grid representing deterministic model estimates show a wide dispersion and a high uncertainty about the current status.Variability in the predictions of the deterministic models implies that management advice is actually more uncertain than if the base case or a best assessment and MCMC had been used to provide advice.Since Figs 10 and 11 show that depending on the choice of scenarios, the stock is being fished sustainably or unsustainablely and may result in conscious or unconcious bias.This shows the importance of having a pre-greed procedure for selecting, rejecting, and weighting scenarios.
Although 1440 scenarios were evaluated, the uncertainty mainly affected the shape of the production function, scale, and level of process error.The status of the stock relative to the reference points was primarily influenced by fixed variables, e.g., adult M and steepness, while absolute biomass was affected by the relative importance of the CPUE and length data and the level of process error by adult M.There was confounding between the effects of steepness and natural mortality, both of which are crucial in determining sustainability, since higher natural mortality and steepness are associated with a steep slope at the origin and a skewed production function.The production function in turn determines the reference points for fishing mortality and how far below B MSY or virgin biomass a stock can fall below productivity is impaired.The sensitivity of the assessment outputs to critical but commonly fixed biological factors such as natural mortality and steepness underscores the limitations of the best assessment if the fixed inputs do not fully capture the range of uncertainties inherent in fish stock assessments.

Plausibility
The initial choice of scenarios and subsequent rejection, acceptance, and weighting is important in determining the status of the stock and subsequent management action.However, estimates from an ensemble may be biased if the models are a subset of all plausible models, some are less likely than others, or the models are non-unique causing redundancy [57].Therefore, assuming the same reliability to all models could introduce bias, so ideally, each model should be assigned a weight [58] based on plausibility.Although the importance of plausibility is widely acknowledged, it is rarely formally defined.This lack of formal definition poses challenges in assessing the credibility and reliability of modelling results, potentially undermining the effectiveness of management strategies.Plausibility refers to the quality of seeming reasonable or probable based on the available evidence and logical coherence.A good example of the value of using observations not used in model fitting is that of [59], where a model was rejected based on alternative data that were not used in the assessment model, in this case fisherderived data showing that one model was implausible.In another case in the ICCAT bluefin assessment conducted using virtual population analysis in which numbers alone are used, the predictions of adult biomass were inconsistent with observations of mean size of older individuals that identified model misspecification [60].

Fisheries management advice
Managing fisheries poses challenges due to uncertainties and risks that arise from natural variability, imperfect information on aquatic ecosystems, and the inability to fully control fisheries [61].Therefore, a stock assessment is performed to provide probabilistic statements about the status of stocks and their response to management.The risk of stock depletion or failure to consistently achieve objectives should be equivalent across all data quality categories and assessment methods [62].The consideration of risk equivalence allows for a formal treatment of uncertainty so that management decisions can deliver consistent results [8] as required in tiered assessment frameworks and move toward an ecosystem approach to fishing (EAF).
However, estimating probabilities in stock assessments is difficult and requires a comprehensive approach to incorporate uncertainties and associated risks.Uncertainty sources include parameters for which there is minimal information in the data, model structure, and process variability.Bayesian MCMC methods can handle parameter uncertainty, and reversible jump MCMC methods model structure [63].However, computational demands and the potential for misspecification of MCMC limit its application in the time frame of stock assessment working groups.Instead, assessment groups often use scenario testing as a robustness check, but they may then combine scenarios to provide advice [19], thus confusing sensitivity and uncertainty analysis.
The choice between sensitivity analysis and uncertainty analysis depends on the objectives of the assessment and the nature of the available data.Sensitivity analysis is beneficial to prioritise research efforts, while uncertainty analysis is crucial for a comprehensive understanding of potential outcomes.An objective approach for selecting, screening, and weighting hypotheses should be preagreed, to avoid "cherry picking".The validation of the model should be performed using a diagnostic toolbox [10] with focus on prediction skill [64].Prediction skill can be used to help identify and test alternative hypotheses to compare different modelling frameworks, to explore model misspecification and data conflicts, and to weight scenarios.
Once a sensitivity analysis has been performed, it can be used to determine whether estimates from alternative models fall outside the confidence or credibility intervals of a reference case.If the estimation error is less than the model error, this can indicate a lack of information in the data.The estimation error is related to the type and quality of the data, the estimable parameters and the variance assumed for priors and observations.Therefore, a high estimation error can indicate a lack of contrast in the data or a violation of the assumptions of the model.If the model error is greater than the estimation error, then statements about achieving targets and avoiding limits are model-dependent, and so an ensemble of models should be built.
However, conducting a full uncertainty analysis is difficult, especially when there is a great uncertainty.Therefore, sensitivity analysis is generally preferred to identify factors that are of high risk.For example, as in this case, it was found that the natural mortality of juveniles had little impact, but that of adults had a large effect on the production function, reference points, and the level of process error.The weight given to the length data determined the scale, that is, the absolute biomass and MSY.A sensitivity analysis can be used to agree Operating Models for the evaluation of management strategies to evaluate robust management strategies.However, after the implementation of an agreed management strategy, performance must be reviewed.This should be done less frequently than stock assessments used to set catch limits and, if possible, a comprehensive assessment conducted using an uncertainty analysis.This will provide an opportunity to learn from the implementation and apply lessons learnt in future iterations of the management cycle.By continuously refining strategies based on empirical evidence and practical experience, the management process becomes more dynamic and responsive to changing conditions and new information helping in moving towards EAF.
In MSE conditioning Operating Models in the form of an uncertainty grid, and then integrating outcomes to derive probabilities for performance metrics might obscure critical risk assessments.For example, the risk of falling below acceptable limits is commonly found from the tails of probability distributions.They may be captured more usefully through scenarios, i.e. a subset of Operating Models that embody specific uncertainties, and concerns of stakeholders.In this study, we showed that despite 1440 scenarios, there were 3 main outcomes; the shape of the production function, the scale, and the level of process error.An empirical management procedure could be tuned to provide robust advice on a much reduced subset of Operating Models.Therefore, our results support the use of sensitivity rather than uncertainty analysis for conditioning Operating Models, which examines the effects of varying key parameters and model structures.Therefore, provide a clearer identification of scenarios that significantly influence outcomes.This helps to achieve the pragmatic goals of MSE by identifying robust management strategies and research priorities.To ensure that management advice is comprehensive and resilient, we advocate for a more deliberate and scenario-focused analysis to inform fisheries management decisions.

Model development
Developing and validating models is crucial to address the complexities and uncertainties in fisheries management.Therefore, we propose a systematic and transparent approach to support a robust decision based on the following stages.
Identifying key uncertainties.The process begins by identifying critical uncertainties that affect stock assessments, including data quality, model structure, parameter estimation, and variability in fish population and fishing.Integrating stakeholder concerns is also increasingly vital as we transition to an EAF, ensuring that the models reflect the varied values and priorities within the fisheries system.For example, eliciting concerns, preferences, and objectives through interviews, workshops, and surveys enables the inclusion of various perspectives, such as those of fishermen, conservationists, industry representatives, and indigenous communities [65].This approach promotes a transparent and inclusive fisheries management process that leads to sustainable and equitable results.Stakeholder participation during model development helps build ownership and trust, crucial for effective implementation of advice.
Selection of model candidates.Once uncertainties have been identified, a diverse suite of model candidates can be proposed, where the model may represent different hypotheses regarding stock dynamics, but also variations in life history traits, fishing pressure, environmental impacts, and ecosystem interactions.The selection depends on the focus, for example, whether it is on a single species or an ecosystem approach as we transition toward more integrated fisheries management practices.
Development of individual models.The next stage is the development and independent diagnostic evaluation of individual models.This involves structuring the model, fixing parameters or agreeing priors, and then fitting the model to the available data.
Weighting and integration of models.Algorithm.Therefore, we propose the following algorithm to develop stock assessment models used to provide and review the implementation of advice.
1. Run pre-agreed diagnostics to develop a reference case 2. Develop scenarios for plausible hypotheses that have a large effect 3. if model error is less than estimation error, then you can use the reference case for advice (i.e.best assessment paradigm) 4. If model error is greater than estimation error agree on scenarios, then an ensemble may better reflect uncertainty (i.e.ensemble paradigm) and the associated risks and should be preferred over a best-case scenario.This will be context-sensitive, but the rationale should be clearly stated 5. Fit scenarios and repeat diagnostics 6. Weight model scenarios in the ensemble based on diagnostics way forward is to use a discrete weight system (W(D)) based on diagnostic scores [66] to provide an estimate of plausibility based on the fit to the data.
The components W(D) can be calculated based on a series of interconnected diagnostic tests [10].
Each component W is assigned a value of 1 when the run passes the diagnostic test and a 0 if it fails.In addition, different weights could be assigned for the different diagnostic tests used.This provides an extension of current practice, and as more research is conducted on model weighting, this can be adapted.
Management advice.Models should be validated against independent data to evaluate the robustness and reliability of their outputs as part of an iterative process of development and refinement to maintain the relevance and accuracy of the models as new data emerge and our understanding of ecosystems evolves.This allows decision makers to account for uncertainties and risks.This structured approach, by continuously integrating and comparing various models, supports adaptive fisheries management.

Conclusions
The three paradigms of best assessment, model ensemble, and MSE are all critical to adopting a precautionary approach to fisheries management, as they require the quantification of uncertainties and the associated risks that may impede the achievement of management goals.Statements of plausibility need to be supplemented by judgments of imprecise probability and knowledge strength.That an event or scenario is plausible is a vague statement and a scientific approach requires precision on both likelihood and knowledge [67].Model validation, using prediction skill, provides a rigorous assessment of the plausibility of models and scenarios, using empirical evidence.Thus ensuring that advice is both credible and reliable.While, ensemble models, which incorporate an extensive array of plausible scenarios, will enhance our ability to encapsulate the inherent variability and uncertainty characteristic of fish stock assessments.
Moving toward an EAF requires continuous refinement and adjustments in response to evolving data and emerging insights, ensuring that management strategies take into account changing conditions and incorporate the latest evidence, thus improving their effectiveness.The incorporation of prediction skill as a metric for validation across all paradigms will improve the robustness of the management advice provided.Prediction skill provides an objective framework for the selection, rejection, and weighting of models or Operating Models ensuring that they are based on plausible hypotheses.By adhering to the principles of risk equivalence and embracing model validation through prediction skill, fisheries management can develop strategies that are not only resilient and sustainable, but also informed by a deep understanding of ecosystem dynamics and supported by empirical validation.

Fig 2 .
Fig 2. Time series of yield, harvest rate, and spawning stock biomass, relative to MSY reference points, for the main effects in the uncertainty grid; thick black line is the base case.https://doi.org/10.1371/journal.pone.0302576.g002

Fig 3 .
Fig 3. Kobe phase plot showing spawning biomass and fishing mortality, relative to MSY target reference points.Yellow points correspond to all the deterministic model estimates in the uncertainty grid, and blue points to the MCMC posteriors for the base case.https://doi.org/10.1371/journal.pone.0302576.g003 Fig 6 show the recruitment deviates and Fig 7 process error.M has the greatest effect, since if adult M is large, then recruitment and variability in the strength of the year class have a great effect on biomass.Steepness and other factors have less of an impact.

Fig 4 .
Fig 4. Deterministic estimates in the final year from the uncertainty grid of biomass, harvest rate, and yield relative to MSY reference points; summarised by adult M and steepness.https://doi.org/10.1371/journal.pone.0302576.g004